C2CU : A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm

نویسندگان

  • Daisuke Takafuji
  • Koji Nakano
  • Yasuaki Ito
چکیده

We present a time-optimal implementation for bulk execution of an oblivious sequential algorithm. Our second contribution is to develop a tool, named C2CU, which automatically generates a CUDA C program for a bulk execution of an oblivious sequential algorithm. C2CU: A CUDA C Program Generator for Bulk Execution

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallelization of the Cuckoo Search Using CUDA Architecture

Cuckoo Search is one of the recent swarm itelligence metaheuritics. It has been succesfuly applied to a number of optimization problems, but is stil not very well researched. In this paper we present a parallelized version of the Cuckoo Search algorithm. The parallelization is implemented using CUDA architecture. The algorithm is significantly changed compared to the sequential version. Changes...

متن کامل

موازی سازی شبیه سازی پدیده ناپایداری دوجریانی به روش PIC

Two stream instability in plasma is simulated by PIC method. The execution time of the sequential and parallizable sections of the program is measured. The sequential program is parallelized with the help of the MPI functions. Then, the execution time of the sequential program versus the number of the grid points and the execution time of the parallel program on 3 and 5 processors versus the nu...

متن کامل

A Hybrid Approach to Parallel Connected Component Labeling Using CUDA

Connected component labeling (CCL) is a mandatory step in image segmentation where each object in an image is identified and uniquely labeled. Sequential CCL is a time-consuming operation and thus is often implemented within parallel processing framework to reduce execution time. Several parallel CCL methods have been proposed in the literature. Among them are NSZ label equivalence (NSZLE) meth...

متن کامل

Comparison of Parallel CUDA and OpenMP Implementations of Particle Swarm Optimization

Since the physical constraints on micro computing devices have forced the researchers to design next generation chips, the significance of the parallelization and distributed computing grow in importance. In this study, a sequential implementation of the Particle Swarm Optimization algorithm is converted into a concurrent version, which is executed on the cores of both CPU and GPU. For this rea...

متن کامل

Analysis of a Step-Based Watershed Algorithm Using CUDA

This paper proposes and develops a parallel algorithm for the watershed transform, with application on graphics hardware. The existing proposals are discussed and its aspects briefly analysed. The algorithm is proposed as a procedure of four steps, where each step performs a task using different approaches inspired by existing techniques. The algorithm is implemented using the CUDA libraries an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Concurrency and Computation: Practice and Experience

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2014